With the substantial performance of neural networks in sensitive fields increases the need for interpretable deep learning models. Major challenge is to uncover the multiscale and distributed representation hidden inside the basket mappings of the deep neural networks. Researchers have been trying to comprehend it through visual analysis of features, mathematical structures, or other data-driven approaches. Here, we work on implementation invariances of CNN-based representations and present an analytical binary prototype that provides useful insights for large scale real-life applications. We begin by unfolding conventional CNN and then repack it with a more transparent representation. Inspired by the attainment of neural networks, we choose to present our findings as a three-layer model. First is a representation layer that encompasses both the class information (group invariant) and symmetric transformations (group equivariant) of input images. Through these transformations, we decrease intra-class distance and increase the inter-class distance. It is then passed through a dimension reduction layer followed by a classifier. The proposed representation is compared with the equivariance of AlexNet (CNN) internal representation for better dissemination of simulation results. We foresee following immediate advantages of this toy version: i) contributes pre-processing of data to increase the feature or class separability in large scale problems, ii) helps designing neural architecture to improve the classification performance in multi-class problems, and iii) helps building interpretable CNN through scalable functional blocks.
translated by 谷歌翻译
Soft actuators have attracted a great deal of interest in the context of rehabilitative and assistive robots for increasing safety and lowering costs as compared to rigid-body robotic systems. During actuation, soft actuators experience high levels of deformation, which can lead to microscale fractures in their elastomeric structure, which fatigues the system over time and eventually leads to macroscale damages and eventually failure. This paper reports finite element modeling (FEM) of pneu-nets at high angles, along with repetitive experimentation at high deformation rates, in order to study the effect and behavior of fatigue in soft robotic actuators, which would result in deviation from the ideal behavior. Comparing the FEM model and experimental data, we show that FEM can model the performance of the actuator before fatigue to a bending angle of 167 degrees with ~96% accuracy. We also show that the FEM model performance will drop to 80% due to fatigue after repetitive high-angle bending. The results of this paper objectively highlight the emergence of fatigue over cyclic activation of the system and the resulting deviation from the computational FEM model. Such behavior can be considered in future controllers to adapt the system with time-variable and non-autonomous response dynamics of soft robots.
translated by 谷歌翻译
基于变压器的架构在许多下游流动任务中显示出显着的结果,包括问题应答。另一方面,数据的可用性阻碍了获得低资源语言的合法性能。在本文中,我们调查了预先训练的多语言模型的适用性,以提高低资源语言的问题的表现。我们使用与MLQA DataSet类似的七种语言进行多语言变压器架构测试了四种语言和任务适配器的组合。此外,我们还提出了使用语言和任务适配器回答的低资源问题的零拍摄转移学习。我们观察到堆叠语言和任务适配器对低资源语言的微语文变压器模型的性能显着提高。
translated by 谷歌翻译
过去十年互联网上可用的信息和信息量增加。该数字化导致自动应答系统需要从冗余和过渡知识源中提取富有成效的信息。这些系统旨在利用自然语言理解(NLU)从此巨型知识源到用户查询中最突出的答案,从而取决于问题答案(QA)字段。问题答案涉及但不限于用户问题映射的步骤,以获取相关查询,检索相关信息,从检索到的信息等找到最佳合适的答案等。当前对深度学习模型的当前改进估计所有这些任务的令人信服的性能改进。在本综述工作中,根据问题的类型,答案类型,证据答案来源和建模方法进行分析QA场的研究方向。此细节随后是自动问题生成,相似性检测和语言的低资源可用性等领域的开放挑战。最后,提出了对可用数据集和评估措施的调查。
translated by 谷歌翻译
由于不规则的病变界限,病变与背景之间的对比度较差,以及伪影之间的对比度,皮肤病的自动分割是一种具有挑战性的任务。在这项工作中,提出了一种新的卷积神经网络的方法,用于皮肤病变分割。在这项工作中,提出了一种新型多尺度特征提取模块,用于提取更多辨别特征,以处理与复杂的皮肤病变有关的挑战;该模块嵌入在UNET中,替换标准架构中的卷积层。此外,在这项工作中,两个不同的关注机制完善了编码器提取的特征和后ups采样的特征。使用两个公开的数据集进行评估,包括ISBI2017和ISIC2018数据集。该方法报告了ISBI2017数据集中的准确性,召回和JSI,97.5%,94.29%,91.16%,95.92%,95.92%,95.37%,95.37%,91.52%在ISIC2018数据集。它在各个竞争中表现出现有的方法和排名的模型。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives.
translated by 谷歌翻译
and widely used information measurement metric, particularly popularized for SSVEP- based Brain-Computer (BCI) interfaces. By combining speed and accuracy into a single-valued parameter, this metric aids in the evaluation and comparison of various target identification algorithms across different BCI communities. To accurately depict performance and inspire an end-to-end design for futuristic BCI designs, a more thorough examination and definition of ITR is therefore required. We model the symbiotic communication medium, hosted by the retinogeniculate visual pathway, as a discrete memoryless channel and use the modified capacity expressions to redefine the ITR. We use graph theory to characterize the relationship between the asymmetry of the transition statistics and the ITR gain with the new definition, leading to potential bounds on data rate performance. On two well-known SSVEP datasets, we compared two cutting-edge target identification methods. Results indicate that the induced DM channel asymmetry has a greater impact on the actual perceived ITR than the change in input distribution. Moreover, it is demonstrated that the ITR gain under the new definition is inversely correlated with the asymmetry in the channel transition statistics. Individual input customizations are further shown to yield perceived ITR performance improvements. An algorithm is proposed to find the capacity of binary classification and further discussions are given to extend such results to ensemble techniques.We anticipate that the results of our study will contribute to the characterization of the highly dynamic BCI channel capacities, performance thresholds, and improved BCI stimulus designs for a tighter symbiosis between the human brain and computer systems while enhancing the efficiency of the underlying communication resources.
translated by 谷歌翻译